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Abstract. The problem of computing saddle points is important in certain 
problems in numerical partial differential equations and computational chem- 
istry, and is often solved numerically by a minimization problem over a set of 
mountain passes. We point out that a good global mountain pass algorithm 
should have good local and global properties. Next, we define the parallel 
distance, and show that the square of the parallel distance has a quadratic 
property. We show how to design algorithms for the mountain pass problem 
based on perturbing parameters of the parallel distance, and that methods 
based on the parallel distance have midrange local and global properties. 
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1. Introduction 

We begin with the definition of a mountain pass. 

Definition 1.1. (Mountain pass) Let X be a topological space, and consider a, b £ 
X. Let r(a, b) be the set of continuous paths p ; [0, 1] — > X such that p(0) = a and 
p(l) — b. For a function / : X — > R, define an optimal mountain pass p £ T(a,b) 
to be a minimizer of the problem 

inf sup fop(t). (1.1) 

peT(a.b) 0<t<l 

The point x is a critical point if V/(x) = 0, and the critical point a; is a saddle 
point if it is not a local maximizer or minimizer on X. The value /(x) is a critical 
value if x is a critical point. We say that x is a saddle point of mountain pass type 
if there is an open set U containing x such that x lies in the closure of two path 
connected components of {x £ U : f{x) < /(x)}. In the case where / is smooth 
and an optimal mountain pass p : [0, 1] — > X exists, the maximum of / on p([0, 1]) 
is a saddle point. 



Date: November 20, 2012. 
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In this paper, we shall focus on the case where X = W 1 and the saddle point is 
nondegenerate. A saddle point x is said to be nondegenerate if V 2 /(ai) is invertible. 
Moreover, a nondegenerate saddle point x has Morse index one if V 2 /(a;) contains 
exactly one negative eigenvalue. 

The problem of finding saddle points numerically is important in the problem of 
finding weak solutions to partial differential equations numerically. The first critical 
point existence theorems now known as the mountain pass theorems were proved in 
|AR731lRab77] . Some recent theoretical references include [MW891 lRab86l ISch991 
IStr081 IWil96| . See also the more accessible reference [ Jab03j . The original paper of 
a mountain pass algorithm to solve partial differential equations is [CM93 , and it 
contains several semilinear elliptic problems. Particular applications in numerical 
partial differential equations include finding periodic solutions of a boundary value 
problem modeling a suspension bridge [Fen94j (introduced by |LM9I|), studying a 
system of Ginzburg-Landau type equations arising in the thin film model of super- 
conductivity [GM08] . the choreographical 3-body problem [ABT06J, and cylinder 
buckling [HLP06 . Other notable works in computing saddle points for solving nu- 
merical partial differential equations include the use of constrained optimization 
|Hor04| . extending the mountain pass algorithm to find saddle points of higher 
Morse index [PCC99, LZOIJ, extending the mountain pass algorithm to find non- 
smooth saddle points |YZ05| . and using symmetry |WZ04i |WZ05| . 

The problem of finding saddle points numerically is by now well entrenched in the 
chemistry curriculum. In transition state theory, the problem of finding the least 
amount of energy to transition between two stable states is equivalent to finding an 
optimal mountain pass between these two stable states. The highest point on the 
optimal mountain pass can then be used to determine the reaction kinetics. The 
foundations of transition state theory was laid by Marcelin, and important work 
by Eyring and Polanyi in 1931 and by Pelzer and Wigner a year later established 
the importance of saddle points in transition state theory. We cite the Wikipedia 
entry on transition state theory for more on its history and further references. 
Numerous methods for computing saddle points were suggested through the years, 
and we refer to the surveys [HJJ00 ( HS05, Schff , Wal06j as well as the recent text 
|Wal03j . A software for computing saddle points in chemistry is GaussiarQ. Tools 
for computing transition statetQ are also included in VASFll. Though the entire 
optimal mountain pass is needed for such an application, the process of computing 
saddle points often gives hints on an optimal mountain pass. 

As mentioned in |LP11) , our initial interest in the problem of computing saddle 
points of mountain pass type comes from computing the distance of a matrix A € 
C" x " to the closest matrix with repeated eigenvalues (also known as the Wilkinson 
distance problem). 

We recall three broad methods for computing the mountain pass: 

Path-based methods. The typical mountain pass algorithm makes use of the 
formula in to find a saddle point. The paths in r(o, 6) are discretized, and 

perturbed so that the maximum value of / along the path is reduced. The point on 
an optimizing path attaining the maximum value is a good estimate of the critical 
point. See Figure ITTTl 

http: / /www.gaussian.comTI 

* http://theory.cm. utexas.edu/vtsttools/neb/ 
http: / / cms . mpi . univie. ac. at / vasp / vasp / vasp. html 
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Figure 1.1. The diagram on the left illustrates a path-based 
method, while the diagram on the right illustrates a level set 
method, as was done in |LP11| and this paper. 



Quadratic model methods. Once the iterates are close enough to the saddle 
point x. the quadratic expansion 

f(x) = \{x - x) T V 2 f{x){x -x) + Vf(x) T (x -x) + f{x) + oi\\x - x\\ 2 ) (1.2) 

can form the basis of algorithms that converge quickly to the saddle point. A New- 
ton method can achieve quadratic convergence to the saddle point, or its variants 
can achieve fast convergence. The gradient V/(x) has close to linear behavior, and 
other methods involving solving the linear system are also possible. 

Level set methods. In [LP111 iMFOlj . a different strategy of using level sets 

lev< ; / := {x : f(x) < 1} 

is suggested: For a neighborhood U of the critical point x and an increasing se- 
quence of h converging to the critical value fix), find the closest points in different 
components of U Hlev^/, say Xi and j/j. Figure [TTT1 contrasts path-based methods 
and level set methods. Under additional conditions, {xi}^ zl and both con- 

verge to x. An optimal mountain pass can be estimated from the iterates {xi}^ 1 
and {yi}^i- Advantages of level set methods over path-based methods include: 
(Al) The level set method needs only to keep track of two points at each step 

instead of an entire path. 
(A2) The bulk of computations are performed near the saddle point. 
(A3) The distance between the components of the level set indicate the perfor- 
mance of the algorithm. 
(A4) Provided black boxes for finding closest points to components of the level 
set and for the minimization of the function / on an affine space exist, an 
algorithm locally superlinearly convergent to the critical point is described 
in [LPllj , See also (Dl) in Section G2 

However, here are some difficulties encountered in the level set algorithm in [LPllj . 
which we will elaborate in Section [3] 

One contribution we make in this paper is to identify properties desirable for a 
global mountain pass algorithm. Specifically, we propose these two principles: 

(PI) Suppose / e C 2 . Once the iterates are close enough to a nondegenerate 
saddle point of Morse index one, the algorithm should converge quickly to 
the saddle point x. 

(P2) The global algorithm should find a saddle point of mountain pass type. 
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The analogy to Principle (P2) in optimization is to seek decrease so that iterates 
converge to a local minimizer. Principle (PI) states that the algorithm should have 
fast convergence once close enough to a saddle point. Related to Principle (PI) is 
Principle (PI') below. 
(PI') For the quadratic f(x) = \x T Hx + g T x + c, where H is an invertible sym- 
metric matrix with one negative eigenvalue and n — 1 positive eigenvalues, 
the algorithm should have excellent convergence. 
We make a short summary of the performance of the various mountain pass algo- 
rithms. Path-based methods excel in (P2) due to the proof of the mountain pass 
theorem of |AR73| using the Ekeland variational principle. More specifically, under 
suitable conditions, if Pi(-) is a sequence of paths in T(a, b) such that max t6 [ x] f°p% 
converges to the critical level, then the sequence of maximizers of / along the path 
Pi(-) converge to a saddle point. However, it does poorly for (PI) and (PI') because 
it does not take advantage of the quadratic approximation (|1.2|) to achieve fast con- 
vergence. On the other hand, methods that make extensive use of the quadratic 
approximation (|1.2p excel for (PI) and (PI'), but does not satisfy (P2) because the 
quadratic approximation need not be valid globally. 

Another contribution of this paper is to argue that level set methods should be 
part of a good mountain pass algorithm because it does well for the Principles (PI), 
(PI') and (P2). 

We also show how the parallel distance defined below can be part of a good 
mountain pass algorithm. For a set C C R n , its diameter diam(C) is defined by 
diam(C) := sup{|a; — y\ : x, y £ C}. 

Definition 1.2. (Parallel distance) Let / : R n — > R be C 2 in a convex neighborhood 
U', and let v be a unit vector. See Figure ITT2"1 Consider the set Si v (x) C R™ defined 

by 

Si, v (x) := U' n [{x} + R{v}} n levy/. 
For a neighborhood U of x such that U C U', define the parallel distance gi v : U — > 
R by 

gi, v (x) := diam(S , /jl ,(a:)). 
When Si v (x) = 0, gi, v {x) = 0. In the case where Si tV (x) is a line segment, we can 
write gi t v(x) as 

gi,v{x) = gi.v,\{x) + gi, v ,2(x), (1.3) 

where 

9l,v,i(x) = max{ v T z \f(z) = l,z € S l<v (x)}, (1.4a) 

and gi lV ,2{x) — max{— v T z' \ f(z') = I, z' G Si tV (x)}. (l-4b) 

Also, define z(x) and z'(x) as 

z (x) — argmax{ v T z \ f(z ) = /, z £ Si, v (x)}, 

and z' (x) — argmax{— v T z \ f(z') = I, z' € Si jV (x)}. 

One step of the mountain pass algorithm in |LP11| is to find the closest points 
between components of the level sets. The problem of finding the closest points 
between two sets is not necessarily easy, and an alternating projection algorithm 
converges slowly once close to the optimum points. We will show that as long as v 
is close enough to the eigenvector corresponding to the negative eigenvalue of the 
Hessian of the saddle point, the square of the parallel distance satisfies property 
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FIGURE 1.2. Illustration of gi tV {x) in Definition 11.21 



(PI). This allows us to get around the problem of finding the closest points between 
components of the level sets. 

1.1. Outline of paper. Section [5] discusses various basic properties of the parallel 
distance. The topics discussed are: how the square of the parallel distance satisfies 
(PI'), formulas for the gradient and Hessian of the parallel distance gi. v (-) and its 
square gi lV (') 2 , and why it is preferable to consider gi tV (-) 2 for the smooth problem 
instead of gi, v (-). Section[3]proposes subroutines for a mountain pass algorithm, and 
discusses how to use these subroutines to design a mountain pass algorithm with 
midrange local and global properies. Section @] shows that the Hessian V 2 (gf v )(-) 
is close to the Hessian as predicted by a quadratic model. This shows that the 
Hessian V 2 (<? 2 v )(-) is not sensitive to I as the computations get close to the saddle 
point, making old estimates of V 2 (g 2 , u )(-) useful for future computations involving 
a different I. Section [5] shows how our algorithm performs in an implementation. 

2. Basic properties of the parallel distance 

In this section, we study basic properties of the parallel distance function. 
When / is an exact quadratic whose critical point is nondegenerate of Morse 
index one, we have the following appealing result. 

Proposition 2.1. (Quadratic formula for square of parallel distance in exact qua- 
dratic) Suppose that f : R n — > R is an exact quadratic f(x) = ^x T Hx + g T x + c, 
with H £ R nxra having n—l positive eigenvalues and one negative eigenvalue. Con- 
sider a unit vector v such that v T Hv < 0. Then Si, v (x) is a line segment, and the 
function gi, v {') takes the form (|1.3|l . Additionally, we have 

giv{x) 2 — max lo, _ x T \Hvv T H — (v T Hv)H]x 

I (v 1 HvV L 



+ 2[( 5 J v)v 2 H - {v 1 Hv)g 1 }x 

+ [(g T v) 2 + (v T Hv)[-2c + 21}]] }. (2.1) 
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For gi,v(x) > and v sufficiently close to the eigenvector corresponding to the 
negative eigenvalue of H, the matrix V ' 2 (gf v )(x) hasn—1 positive eigenvalues and 
one zero eigenvalue. The function gf v is convex. Moreover, let x be the saddle 
point —H~ 1 g. If I < f(x) and gf v has a minimizer x, then x is the midpoint of the 
intersection of the line {x} + R{i>} and lev = if . 

Proof. For the case of the quadratic /, the neighborhoods U and U' can be taken 
to be M. n . The value gi, v ( x ) can be computed as follows. At where gi. v (x) > 0, 
let x + tiV, where ^ g R and i = 1,2, be two points of intersection of the line 
{x} + R{v} and the curve lev = ;/. The tfs can be calculated as follows: 

-(x + t l v) T H(x + tiv) +g T (x + tiv) + c = I 
=> {v T Hv)t* + [2{v T Hx)+2g T v}t t +x T Hx + 2g T x + 2c-2l = 0. 
We have 

-2{v T Hx) - 2g T v ± ^A{v T Hx + g T v) 2 - 4(v T Hv){x T Hx + 2g T x + 2c - 21) 



This gives 



2(v T Hv) 



giJx) = J{v T Hx + g T v) 2 - (v T Hv)(x T Hx + 2g T x + 2c - 21) 

v 1 Hv v 

4 

{v T ~Hvf 



9i A*) 2 = i t tj \2 \(v T Hx + g T v) 2 - (v T Hv)(x T Hx + 2g T x + 2c - 2/)] 



{v T Hv) 



4 x T [Hvv T H - {v T Hv)H]x + 2[{g T v)v T H - (v T Hv)g T ]x 



+ [(g T v) 2 + (v T Hv)[-2c + 2l}] 



Taking into account the fact that gi tV (x) can equal zero, gi >v (x) has the formula 
as given in (|2.ip . For the case when v — v, the eigenvector corresponding to the 
negative eigenvalue of H, we find that v is the eigenvector corresponding the zero 
eigenvalue for the Hessian 

V 2 ( 3i 2 )(x) = 8 [Hvv T H-(v T Hv)H]. 

The other eigenvalues of V 2 ((7 ; 2 B )(x) can easily be calculated to be — 8Aj/A n for 
i = 1, . . . , n — 1, where A^s are the eigenvalues of H arranged in decreasing order. 

Note that v is an eigenvector corresponding to eigenvalue zero of W 2 (gf v )(x). 
Recall that the eigenvalues depend continuously on the matrix entries. If the unit 
vector v is sufficiently close to v, then the Hessian V 2 (gf v )(x) has one zero eigen- 
value and n — 1 positive eigenvalues. The convexity of gf v (-) is clear. 

In the case where I < f(x), it is easy to check that x = —H~ 1 g is a minimizer 
of gf v (-)- The other claims are easy. □ 

Proposition 1 2 . 1 1 says that when / is quadratic, then gi lV (') 2 is also a quadratic, 
so a mountain pass algorithm based on the parallel distance will satisfy (PI'). 

We next show that the parallel distance behaves well near the saddle point x of 
Morse index one. 
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Proposition 2.2. (Behavior near saddle point) Let f : R™ — > R be C 2 in a neighbor- 
hood of a nondegenerate saddle point x of Morse index one, andv be the eigenvector 
of unit length corresponding to the negative eigenvector o/V 2 f(x). For S > 0, define 
fs : R" -> R and S s ,i, v (x) by 

f s (x) := \{x-x) T [V 2 f(x)+Sr](x-x)+f(x), (2.2) 
and S S) i lV (x) := [{x} + R{v}} n lev> ; / 5 . 

There is a neighborhood U' of x and e > such that: 

(1) ||V 2 /(x) - V 2 /(x)|| < 5 for all x £ U' . 

(2) // || u — v\\ < e, then the map t t— > f(x + tv), where t € R ; is concave at 
wherever x + tv G {/'. Hence Si tV (x) is either a line segment or an empty 
set. 

(3) If v is a unit vector satisfying \\v — v\\ < e and \l — f(x)\ < e, then for all 
x € M e (x), we have Si v {x) l~l U' C Ssi v (%) C U' (which includes the case 
S hv {x)=%). 

Proof. The statement (1) holds for some U' of x. We can shrink U' if necessary 
so that w T V 2 f(x)v < for all x € U', and an e > can be found so that (2) is 
satisfied. 

Choose 7 > such that B 7 (x) C U'. Then condition (1) ensures that f(x) < 
fs(x) for all x E U', so Si v (x) P\U' C Sg,i,v(x)- The endpoints of the line segment 
Si tV (x) are of the form [x + t\v, x + t^v], whose endpoints can be calculated using 
the quadratic formula employed in the proof of Proposition 12. II as x + tiV, i = 1, 2, 
giving us 

- = [v T H s (x - x)] 
v T Hsv 

^4[v T H s (x - x)} 2 - 4[v T H s v}[{x - x) T H s (x - x) + 2f(x) - 21} 

2[v T H s v] ' 

where H$ = V 2 f(x) + SI. The formula above is continuous in v, x and I whenever 
ti is real, and as x — > x and I — ¥ f(x), we have fj — > 0. From ||a; + tiV — x\\ < 
\\x — x\\ + \ti\, we can choose e small enough so that if \\v — v\\ < e, \l — f(x)\ < e 
and x & IB e (a;), then ||a; + tiv — x\\ < 7, giving us Ss.i,v(x) C B 7 (i) C U' . This 
means that condition (3) holds. □ 

The expression (|1.3[) gives us a way to calculate derivatives of the parallel dis- 
tance. We have the following results. 

Lemma 2.3. (Gradient and Hessian of gi tV ) Let f : R" — >• R be C 2 everywhere. 
Recall the function g^ v : R™ — > R and the neighborhoods U and U' on which gi tV 
is defined. Suppose that gi tV can be represented as p.3p . Let z{x) and z'(x) be 
the respective maximizers in the definitions of gi lV ,i and gi,v,2 in (|1.4al) and (jl.4bl) . 
Then, provided V f(z(x)) T v 7^ and V/(z'(x)) T u 7^ 0, we have 

v , v V/(«(x)) , V/(z'(x)) 

ygi,v(x) - 



Vf(z(x))T V Vf(z'(x))T V 
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To simplify the notation, we suppress the dependence of z and z' on x. We also 
have 

7 2 ( T V/(*> T \ V 2 /(*) ( T Vf{z)v T ' 



v' 9l , v (x) = -ii- 



v T ^f{z 



Vf(z')v T \ V 2 /(zQ / Vf(z')v T 



v T Vf(z>) J v T Vf(z>) 



Proof. Write F(d, t) := f(x + tv + d) = f((x + tv) + d+ (t - t)v) . We evaluate the 
partial derivatives of F at (0, t) to be 

\7 d F(0, t) = V/(x + iv) and V t F(0, t) = V/(x + tv) T v. 

For each d, we can find t such that F(d,t) = 0. By the implicit function theorem, 



the derivative of t with respect to d equals 



Vfjx+tv) 



provided 



Vf{x+tv) T v ~ Vf(z(x)) T v 

the denominator is nonzero. From this and the fact that gi jV ,i(-) and gi, v ^{') are 
constant when moving in the direction v, we get 



Vf(z(x)) 



(2.3) 



Similarly, we have Vgi, v ${x) = vf \ z \ x)) T v 



Vf(z(x)) T v 

= ^f(z*(x))'i ' v ~ v ' f° rmu l a f° r ^gi,v is easily 

deduced. 

Next, we calculate V 2 gi^ v by first calculating V 2 gi, v ,i and V gi, v ,2- To reduce 
notation, we suppress the dependence of z and z' on x. Taking the mth component 
of ([2T5]) gives 

dgi,v,i, \ 1 df(z) 



dx r , 



V T Vf(z) dx r , 



so 



d ( dgi, v ,i 



dx m i \ dx 



(x) = 



i'i.r , 



:{v T Vf{z)) 



[v T Vf{z)\< 



Note that z(x) = x + gi tVt x{x)v — vv T x and z'(x) — x — gi, v ,2{x)v + vv T x. We 
use the notation l{ a =&} to mean 



dz k 



1 



{a=b} 



So 

So by the multi- variable chain rule we have 
d fdg lvl 



| 1 if a = b 
I otherwise. 



1 i <JQi v I [X ) j 0Z L . -, 

= l{m'=k}+ dXm , - Vk-v m >Vk, and = l {m < =fe} 



dgi,v,2{x) 

dx^ i 



Vk+V m >V k . 



(x) 



d 2 f(z) 
dx k dx m 



t i dqi v i (x) 

l{m'=fc} + g x ; V k - V m <V k 



dx m ^k=l Uk Zuk'=l dx k dx k 



;^v/(z)] 2 



9gi,v,i(x) , 



l{k'=m'} + " a 0x' L r' Vk ' ~ v m'Vk' 



[v T Vf(z)]< 
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Now we have, 

v/2 m -V 2 /(*) ~ V 2 f{z)v[V gi , v +{x) T - V T 



v T Vf{z) 

Vf(z)v T V 2 f(z) + Vf(z)v T V 2 f(z)v[Vg liV s(xy j 



[v T Wf(z)} 2 

Substituting Vgi lV}1 (x) = - v T^f^ + v, we get: 
V 2 gi. Vi \{x) 

V 2 /(z) , V 2 f(z)vVf{zf + Vf(z)v T V 2 f{z) 



v T Vf(z) b T V/(z)] 2 
Vf(z)v T V 2 f(z)vVf(z) T 
KV/(z)] 3 

The formula for ~S7 2 gi, v ,2{x) is similar, and the formula for V 2 g;^(x) follows readily. 

□ 

The formulas for V(gf v ) and V 2 (gf v ) can now be calculated, as is done below. 

Proposition 2.4. (Gradient and Hessian of gf v ) Given the conditions in Lemma 
E3 the formulas for V(gf v )(x) and V 2 'igf iV )(x) are V(gf v )(x) = 2gi tV (x)Vg ltV (x) 
and 

V ~ \ Vf(z) T v + V/(z') T W I Vf(z) T v + V/(z') T « 

V/(z)w T \ V 2 /(z) / V/(z)w Tx7 



+2 9lA X) vTvf[zl) j vTvf{zl) y v T Vm 

Proof. We have ^-(x) = 2g^(x)^ 1 {x), so V(gf <v )(x) = 2g Lv {x)V g Lv (x). 
Also, 

d f d (gjv) = 2 dgi,v(x) dgi, v (x) | ^ d 2 g t}V (x) 
We thus have 

V 2 (gL)(z) = 2V. 9i ^(a;)V. gi ^(x) T + 2.g^(x)V 2 .g ; ,„(x), 
which gives the formula for V 2 (g 2 ^)(x). □ 

We now discuss the situation when we consider gi tV instead of its square. Consider 
a quadratic function / : M" —> K whose Hessian has one negative eigenvalue and 
n — 1 positive eigenvalues. For the critical point x and critical level f(x), a plot 
of lev<;/ for I < f(x) has two distinct convex components. One would expect 
that if / : M" — > K is C 2 at a nondegenerate saddle point a; of Morse index one and 
I < f(x), Unlev<if would consist of two convex components for some neighborhood 
U of x. We have the following result on the convexity of the level sets from [LPllj . 



PRINCIPLES FOR MOUNTAIN PASS ALGORITHMS, AND THE PARALLEL DISTANCE 10 




Figure 2.1. lev< / for f(x) = (x 2 ~ x\){x\ - ac|) in Example 12.61 

Proposition 2.5. [LPlTj ( Convexity of level sets) Suppose that f : R™ — )• R is C 2 in 
a neighborhood of a nondegenerate critical point x of Morse index one. Then ife>0 
is small enough, there is a convex neighborhood U e of x such that U £ (1 lev<j( S )_ e / 
is a union of two disjoint convex sets. 

The example below show that Proposition 12.51 may be the best possible. 

Example 2.6. (Tightness in Proposition ^. 5|) Figure I2TT1 shows the level set lev<o/ 
for / : R 2 — > R defined by f(x) — (x 2 — x\){x\ — x 2 ). For this particular /, we have 
the following. 

(1) In Proposition 12.51 the neighborhood U e must satisfy diam([/ c ) \ as 
e \ 0. In other words, the dependence of the neighborhood U e on the 
parameter e cannot be lifted. 

(2) The level set lev<o/ cannot be written as a union of two convex sets in 
some neighborhood of (0,0). 

(3) As a consequence of Proposition 12.51 and (1), the function gi :V : R n -> R is 
convex in x in U e for / = f(x) — e, but the region on which g^ v is convex 
shrinks as / approaches f(x) = 0. 

3. Framework for a mountain pass algorithm 

In this section, we first present subroutines for a mountain pass algorithm, and 
then show how the corresponding mountain pass algorithm has local and global 
properties. 

We first present the subroutines that make up the global algorithm. 

Algorithm 3.1. (Subroutines in global mountain pass algorithm) Here are the 
subroutines that will be the building blocks of our global mountain pass algorithm. 

(PD) (Parallel distance reduction) Given points z and z' and a level I such that 
f(z) = f(z')=l, 

(a) Let v = z — z' , and let x be any point on the segment [z, z'\. 

(b) From V f(z) and V f(z'), determine V(gf v )(x). The Hessian V 2 g 2 v (x) 
may also be calculated or estimated for a ( quasi- ) Newton method. 
These values will give a direction d for decrease of gf v (-). 
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(c) There is some t > such that gi. v {x + td) < gi }V (x). Two cases are 
possible. If gi lV (x + td) > 0, then z(x + td) and z'(x + td) are new 
iterates reducing the parallel distance. If gi tV (x + td) = 0, then let x' be 
a local maximum of f on the line {x + td} + 'M.{v}. We have f(x') < I, 
and we should run (I \.) below. 
(Av) (Adjusting vector v) Given points z and z' and a level I such that f(z) = 
f(z') = l, 

(a) Perturb z and/or z' such that we still have f(z) = f(z') = I, and that 
|| z — z' || is reduced. The vector v = z — z' is now adjusted. 
(I I) (Decrease level I ) Given x and b^O such that x is a local maximum of f 
on {x} + R{v}, 

(a) Find local minimizer of f on {x}+M.{d} , where d _L v and —Wf(x) T d < 
0. The direction d can be chosen to be the projection of — V f(x) onto 
the subspace perpendicular to v. 
(It) (Increase level I) Given points z and z' and a level I such that f(z) = 
f{z') = l, 

(a) Choose some x € [z,z'\ such that f(x) > I. (One choice is x = 
\{z + z').) Perturb z and z' so that f(z) and f(z') equal this new 
value of I. 

Other ways of adjusting the vector v apart from (Av) are possible, though they 
are not as simple as (Av). For example, the vector v can also be calculated by taking 
the eigenvector corresponding to the negative eigenvalue of V 2 f(zi), V 2 /(z,-), or 
some combination of the two matrices. 

We gave a method of decreasing the level I in (I !). Adjustments to the strategy 
presented in (I ],) can be made as needed. For example, the condition rflu can be 
adjusted. 

There are also other reasons to adjust I. First, the contrapositive of Lemma 
I4.6f 1) later can be roughly interpreted as follows: If 1 / \v T u(V f (z))\ is too small, 
then the critical level is below f(z) = I. We can thus reduce the level I. Secondly, 
when g^ v (x) is too high, signifying that the points z(x) and z'(x) are too far apart, 
one can increase I. Third, the points evaluated may not have function value /, 
making a different value more suitable. Lastly, it is possible to estimate I by setting 
the minimizer of gi tV {x) 2 to be zero from the formula in (|2.1J . 

3.1. Fast local convergence. We discuss the fast local convergence properties of 
the level set algorithm. We recall our mountain pass algorithm in [LPllj . where 
we proved local superlinear convergence of a level set algorithm under restrictive 
assumptions, and show how the difficult steps there can be seen as limiting cases 
of subroutines (Av) and (I i) . 

We recall our mountain pass algorithm in |LP11| . 

Algorithm 3.2. [LPllj (A local superlinearly convergent algorithm) Let counter i 
be 0. Given points Zq and z' , and a level Iq such that f(zo) — f(z' ) — Iq. Let U be 
an open neighborhood of the saddle point x that contains zq and z' . 

(1) Perturb zi and z[ to the points z% and z^ so that for some open set U , Zi 
and z[ are the minimizers of the problem 

min||a; — y\\ 

x,y 

s.t. x,y lie in the same component U nlev<; ; / as Zi and z[ respectively. (3.1) 



PRINCIPLES FOR MOUNTAIN PASS ALGORITHMS, AND THE PARALLEL DISTANCE 12 



Figure 3.1. We elaborate on the possible difficulties in finding 
a lower bound of critical level explained in step 2 of Algorithm 
13.21 Let L be the perpendicular bisector of the two closest points 
as shown. The neighborhood U\ is too small as a minimizer of / 
on U\ n L does not exist in the relative interior of (7i H I. The 
neighborhood U% is too large since the minimum value of / on 
U^OL is worse than the previous lower bound on the critical value. 

(2) Let Vi be the unit vector in the same direction as Zi — z[. Find the minimum 
of f on U DLi, where Li is the perpendicular bisector of Zi and z[. Let this 
value be li+\. Find z i+ i and z' i+1 such that they are points in the same 
components of the level set U nlev<; i+1 / as Zi and z[ respectively, and that 

— z' i+1 points in the same direction as Vi. 

(3) Stop if H^i+i — z 'i+i\\ * s sufficiently small, or until we find a point x such 
that ||V/(x)|| is sufficiently small. Increase the counter i, and return to 
step 1. 

Algorithm 13.21 can be built from the subroutines highlighted in Algorithm 13.11 
Step (1) can be seen as applying the step (Av) infinitely many times, while step (2) 
can be seen as applying one step of (I f), then applying (/ J.) infinitely often till the 
minimizer of / on U (1 Li is reached. 

The main result in |LP11] is that in some neighborhood U of a nondegenerate 
saddle point x of Morse index one, the steps in Algorithm 13.21 are well defined, 
and Algorithm 13.21 converges locally super linearly to x. This shows that level set 
methods can satisfy Principle (PI). 

However, Algorithm 13.21 has some disadvantages: 

(Dl) Step 1 in Algorithm 13. 21 is difficult to perform in practice. If an alternating 
projection method was used to solve (|3.1[) for example, the convergence will 
be very slow when close to the minimizers. 
(D2) Related to (Dl) is the problem of ensuring that U nlev<;/ is a union of two 
components for some convex neighborhood U of x. This in turn requires I 
to satisfy I < f(x), where f(x) is the critical level. Step 2 in Algorithm 13. 21 
ensures that the calculated level is an underestimate of the critical level, 
but this step may involve more computational effort than is necessary. 
Algorithm 13.21 can be extended to a global algorithm. A few problems may arise 
in the global case. Firstly, the problem of minimizing / on Li is not necessarily 
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easy. Sometimes, / may not have a local minimizer in U PI L{. Secondly, the new 
estimate I of the critical level f(x) may be even lower than the previous estimate, 
rendering it useless as a lower bound on f(x). Lastly, the estimate I of the critical 
level may actually be an upper estimate of fix) instead. See Figure HOI 

Proposition 12.11 suggests that using gf v (-) overcomes the difficulties (Dl) and 
(D2). Provided v is close enough to the eigenvector corresponding to the negative 
eigenvalue of V/(ai), the function gf v (-) restricted to any {n— 1) dimensional affine 
space not containing v is the maximum of a quadratic with positive definite Hessian 
and 0. One can first minimize gf v (-) as a quadratic. Once close enough to x, the 
minimizer of the corresponding quadratic, say x, will give a good estimate of x. 

3.2. Global convergence results. We now look at the global mountain pass 
algorithm involving the subalgorithms listed in Algorithm 13. 11 

Algorithm 3.3. (Global mountain pass algorithm) Let the counter i beO. Suppose 
the points zq and z' and a level Iq are such that f(zo) — f(z' ) ~ Iq. Let Xq be some 
point in the line segment [zq, z' ]. Let vq — zq — z' . 

(1) Run (PD) on Zi, z[, Vi and li. Three outcomes are possible: 

(a) If the new parallel distance is positive and sufficient decrease in the 
parallel distance is obtained, let the output be Zi+i and z' i+1 . Let = 
li. Run (Av), which perturbs either Zi+\ or z' i+1 . The vector Uj+i is 
set to be the unit vector in the direction of Zi + i — z' i+1 . 

(b) If the new parallel distance is positive but the parallel distance changed 
little from previous iterations, run (l^[) to perturb Zi+i and z' i+1 , and 
let be the new level. The vector Vi+\ equals u,-, unchanged from 
before. 

(c) If the new parallel distance is zero, then let be the new level, and 
let Xj+i be the local maximum as stated in (PD). Run (I \ r ). The new 
level is still labeled as li+i- The vector Vi+i equals Vi, unchanged from 
before. 

(2) Increase i by one. If in the course of the calculations, a point x such that 
||V/(ie)|| is small is encountered, then the algorithm ends. If \\zi — z[\\ is 
small and the distance of to the convex hull of {V/(zi), V/(z^)} is small, 
then we can extrapolate some point x € \zi,z'^\ for which ||V/(x)|| is small, 
and end the algorithm. Otherwise, go back to step 1. 

In Algorithm 13.31 the subroutines (PD) and (Av) reduce the distance between 
the components of the level set U n lev<;/. Algorithm 13.31 illustrates just one way 
to decide which of the subroutines (PD), (Av), (I f) and (I I) to use at each step, 
and other combinations are possible. There is still flexibility on whether option 
1(a) or 1(b) is taken. Once close enough to the saddle point, a quadratic model 
method can be used. 

The basis of (P2) for both Algorithms 13.21 and 13.31 is the following result. 

Theorem 3.4. |LP11| I Global convergence of level set algorithm) Let f : X — > 
K. Suppose {a«}^o and {bi}°°^ are sequences of points and {^}^ * s a sequence 
satisfying li /* f(x). If Xi and yi lie in separate components of {x \ f{x) < U}, and 
x — linii_ s . 00 a,i = lim^oo bi, then x is a saddle point. 

One difficulty is to decide whether a; and bi are in different components of 
lev^/, but we can use V/(ai) and V/(6j) to make a guess. Note that provided 
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the limits exist, lim.i_>oo Obi = linii_>oo bi is equivalent to limj-^oo || — bi\\ = 0. 
This principle can be seen as a convergence property of Algorithms 13.21 and 13.31 
It is therefore pragmatic to decrease the distance or parallel distance between the 
components of the level sets, especially at the start of a global mountain pass 
algorithm where the quadratic approximation is not valid yet. The problem of 
choosing the sequence {h}°^ is much more difficult. The strategy in Algorithm 
13.31 is adequate for our numerical experiment, but more still needs to be done. 

4. Independence of I in estimating V 2 (g 2 J(-) 

We recall that in our level set algorithm in Section 02 we perturb the level I 
using subroutines (/ 1) and (/ \) so that I converges to the critical value f(x) of the 
saddle point x. Such changes in I can be quite sudden. The Hessian V 2 (<7f »,)(•) is 
not continuous at x because of the ^rjjz] m its formula, and the continuity at an 
x where V 2 (<7/ 2 „)(-) is only good enough for small changes in I. In this section, we 
show in Theorem 14 . 91 that there is a neighborhood U of the saddle point x such that 
as long as I < f(x) is close enough to f(x) and v is close enough to the eigenspace 
corresponding to the negative eigenvalue of V 2 (gf v )(x), the Hessian V 2 (g 2 v )(x) for 
x € U can be estimated from a quadratic model of / at x. Such a result shows 
that under changes of I near x, the Hessian V 2 (g 2 tJ )(-) does not depend too much 
on I, making previous estimates of V 2 (g 2 t) )(-) useful for future iterations. As a 
consequence, we obtain the convexity of ffj «(■) ■ 

First, we have the following result that allows us to identify convexity. 

Proposition 4.1. (Convexity from positive definite Hessians) Suppose f : K.™ — > 
[0, oo) is a continuous function that is C 2 at all points x satisfying f(x) > 0, and the 
corresponding Hessian V 2 /(a;) is positively semidefinite. Then f is convex. (The 
issue here is that the nonsmoothness of f on the boundary of {x \ f(x) — 0} does 
not affect convexity.) 

Proof. The usual convexity test tf(x) + (l-t)f(y) > f(tx+(l-t)y) for all x, y G R n 
and t G (0, 1) allows us to reduce the problem in KL ra to that of n = 1. We first 
notice that there cannot exist x\, xi £ M. such that x\ < x < X2, f{x\) = f{x%) = 0, 
and f{x) > for all x G {x\ , £2), since this is a contradiction to the convexity of / 
on (x 1 ,x 2 ). 

Using the above property, we can find x^,X4 G tU{- 00, 00} such that X3 < £4 
and 

f(x) \ = ° if X £ ^ X3,X4 ^ 
1 > if x ^ [X3, X4}. 

Note that one or both of x% and X4 might be ±00. It is an easy exercise that the 
subdifferential mapping df is monotone, thus / is convex. □ 

We shall make use of Proposition 14. II to establish the convexity of g 2 v by making 
sure that the Hessian V 2 (gf v ) is positive semidefinite whenever gi ;V > 0. 
We make some simplifying assumptions for the rest of this section. 

Assumption 4.2. (Smooth f) Assume that f : M™ — > R is a C 2 function with a 
nondegenerate critical point x — of Morse index one satisfying /(0) = such that 
H = V 2 /(0) is diagonal with entries arranged in decreasing manner as Ai, . . . , A„. 
This means that the diagonal entries of V 2 /(0) consist of n— 1 positive eigenvalues 
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and one negative eigenvalue. Let the eigenvector corresponding to the negative 
eigenvalue A„ be v. 

We also make another definition that will simplify many of the statements in 
this section. Denote fg : R" -> R by 

fs(x) = ~x T [V 2 f(0)+6I]x. (4.1) 

Let gs,i,v be the value of gi tV defined through the quadratic fg(-) (instead of through 
/(•)). The values zg(x) and z' s (x), defined through fg will be of use later in this 
section. We write / = fo, gi, v = go,i,v, %{■) = zo(-) and z'(-) — z' (-). We also write 
Hg = V 2 /(0) + SI to simplify notation. 

Definition 4.3. (Continuity condition) For a function / : R" — > R satisfying 
Assumption 14.21 S > 0, 7 > and convex neighborhoods Us and U' s of such that 
Ug Ct/j, we say that condition P(/, <5, 7, t/j) is satisfied if 

(1) ||V 2 /(x) - V 2 /(0)|| < 6 for all x € U' s , 

(2) For fg : R" R and S«,j,„(-) as defined in (|2~2"1) , we have S s ,i, v (x) C C/^ for 
all x eUg and / e (-7, 0]. 

It is clear through Proposition 12.21 and the continuity of the Hessian that for 
any S > 0, there must be convex neighborhoods Ug and U' s such that Ug C Ug and 
P(f, S, 7, C/5, Ug) holds. It is also clear that if P(f, S, 7, Ug, Ug) holds, we have 

ix T [V 2 /(0) - SI]x < f(x) < ix T [V 2 /(0) + SI]x for all x e Ug. (4.2) 

The next result is a bound on the error in z(x). 

Lemma 4.4. ( Controlling z(x) ) Suppose that f : R" — > R is C 2 and satisfies 
Assumption \4-S\ Let v be a unit vector such that w T V 2 f(0)v < 0. For any e > 
0, £/iere are S > 0, 7 > and convex neighborhoods Ug and U' s of such that 
P{f, S, 7, Ug, Ug) holds, and for all x G Ug and I G (—7, 0], we have 

\\z(x) - z(x)\\ < e\\z(x)\\, 
\\z'(x) -z'(x)\\ < e\\z(x)\\, 
and \m, v (x) - gi, v (x)\ < e\\z(x)\\. 

Proof. Since z{x) and z(x) lie inside the line segment [zg(x), z-g(x)], we have 
\\z(x) — z(x)\\ < \\zg(x) — z_,5(a;)||. Since zg(x), z_g(x), z'_ 5 (x) and z' 5 (x) line 
up in a line (with direction v) in that order, we have 

\\zg(x) - z-g(x)\\ < \\zg(x) - z-g(x)\\ + \\z' 5 (x) - z'_ 5 (x)\\ 

= 9s,i,v(x) - g-g,t, v (x). 

Similarly, 

\9Lv(x) - 9i,v(x)\ < \\zg(x) - z^g(x)\\ + \\z'g{x) - z'_g{x)\\ 

= §5,i,v(x) - g-s,i, v (x). 

Our goal is therefore to prove that for every e > 0, we can find a 5 > such that 
9s,l,v(x) - g-gj, v (x) < e\\z(x)\\ for all x G R". 

Note that our problem has now been transformed to a new problem on an exact 
quadratic /(•). 
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The treatment for the case I = and I < are different, and we start off by 
treating the case I = 0. 

CASE I = 0: For a point x £ W 1 , the sets lev<o/5 and lev<o/-,5 are cones, 
with lev<o/5 C lev<o/-5- For 5 > small enough, V 2 /(0) consists of n — 1 positive 
eigenvalues and one negative eigenvalue, so lev<o/,5 and lev<o/-5 are both the union 
of two convex cones intersecting only at 0. For a point x, the points zg(x), z~$(x), 
z' s {x) and z'_ s (x) can be calculated easily from the quadratic formulas we have seen 
in the proof of previous results (in particular, Proposition 12. ip , giving 



iA x ) = \\ z s( x ) ~ 40*011 = 



and gs,i,v(x) = \\zs{x) 
Consider the problem 



2^[v T H 5 x} 2 ~ [v T H s v}[x T Hsx] 
v T H$v 

_ 2 v / [v T H„ 5 x} 2 - [v T H-sv][x T H- S x] 



v T H_$v 



max h$(x) 



where h$(x) = gs,i,v(x) — gs,l,v(x). The function hg{-) is continuous, and the set 
<9B := {x : \\x\\ = 1} is compact. The optimization problem above satisfies the 
conditions in Proposition 14.51 so for any e > 0, we can choose S > such that 
max^gda h$(x) < e. We have 

h s {z(x)) < \\z(x)\\ maxh S (y) < e\\z(x)\\. 

-1/2 first. The other cases follow 



CASE I < 0: We can consider the case I 
by a scaling. 

If ||a:|| > 1/VS, then f 2S (x) < implies that f s (x) = f 2S (x) - ^S\\x\\ 2 < 
Also, fs(x) < — \ clearly implies f-2s(x) < f-s{x) < 0. This gives 



2 ' 



7I 1 



n lev< / 2 5 c 



c 



c 



±1 

Vs 
Vs 
Vs 



n lev<_ 1/2 /<5 



n lev<_i/ 2 /-d 



n lev< /_25, 



where [-] c is the complementation of a set. By the treatment for the case I = 0, for 
any e > 0, we can find 8i > such that if ||z(a;)|| > then 1 1 zg 1 (x) — zs 1 (x)\\ < 

|||z(:r)||. Therefore, if 8 2 £ (0,<5i), we have ||z,5 2 (.x) — z^s 2 ( x )\\ < which 
gives 

9s,i,v(x) - gs,i,v( x ) - e \\z( x )\\- 
We still need to treat the case where ||f(x)|| < 1/VSi- The condition f(z(x)) — 
— \ implies that ||f(a;)|| > A n + 2<5i, where A„ is the negative eigenvalue of 

V 2 /(0). We make use of the same strategy to estimate ||zs(x) — 2_$(a;)|| as in the 
last case. This time, the formulas give 

2^>[v T H s x} 2 - [v T H s v][x T H s x - 21} 



gs,iA x ) ~ 9-s,iA x ) 



2^/[v T H- S x} 2 - [v T H-sv][x T H- S x - 21} 
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We are led to consider the problem 



where 



hg(x) 
and C 



max h§(x), 



9s,i,v( x ) ~ 9-s,l,v( x )> 



{y : l/y/-X n + 25 1 < \\y\\ < l/y/h). 

Once again, hg(-) is continuous, C is compact, and Proposition 14.51 can be applied. 
There is some 62 such that < 62 < S\ and max xG c h$ 2 (x) < e /V—^n + 2<5i. If 
\\z(x)\\ e [l/V-An + 25i, 1/v^i]) then we have 



&* a («(a:)) < e/V-An + 251 < e||z(x)||. 

The case where Z is another negative number differ from the case I 
scaling. Our claim follows. 

Here is a result that we have used for Lemma 14.41 



-1/2 by a 
□ 



Proposition 4.5. (Convergence to zero of maximum value) Suppose that h$ : 
C — > K is continuous for all S > 0, C is compact, and that Si < 82 implies 
hsxix) < h$ 2 (x) for all x € C. Assume also that for all x £ C , hs(x) ~\ as S \, 0. 
Then max xe( 7 hg(x) \ as S \ 0. 

Proof. For each sequence 5j \, as i 00, there is a maximize! Xi such that 
hsi (xi) = maxzgc hg i (x). It suffices to show that hs i {xi) \ as i 00. Due to the 
compactness of C, we can assume that there is a subsequence of {x{\ converging 
to some x € C. For any e > 0, there is some a such that h$ a (x) < e and a 
neighborhood U e of x such that ft,^ (x) < 2e for all x € U e . This means that some tail 
of the sequence {hg i (xi)}°^L 1 is less than 2e. Since e is arbitrary, max xe c h&i i x ) — 
hsi(xi) \ as i /oo as needed. □ 

For x ^ 0, let u{x) = x/||x||. Here are some bounds we need to check: 

Lemma 4.6. (Uniform bounds on terms) Suppose that f : W 1 — > R is defined 
by fi x ) = \x Tt \7 2 f(0)x, where f : R" — > R is a function satisfying Assumption 
Let v be the eigenvector corresponding to the negative eigenvalue of V 2 /(0). 
Assume that v is a unit vector such that \\v — v\\ < a. Let z be a point such that 
f(z) < 0. We have the following: 

(1) i/\v T u(vm)\ < 1/ 



(2) 



l|V/(5)|| 

I < 0. 



s(Ai,-A„)-A„ 



for all z such that f{z) < 0. 



< 



|A„|-2a|A, l |-ct :4 max(Ai.-A„) 



for all z satisfying f(z) 



I , where 



Proof. Let H be V 2 /(0), which we recall is diagonal. We prove (1) and (2) 
(1) We have 



\v T u(Vf(z(x)))\ 



> 
> 



\v T u(Vf(z))\-\(v-v) T u(Vm)\ 

r Hz 



Hz\ 



2 z 2 
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From the fact that f(z) < 0, we have 

n 

Em 2 ^ 



-A n ^ Xi 



Therefore 



An^-n 



2-~ii=l A i ^n 



> 



> 



-A„ Y^l=i Ai 



A 2 z„ — A n J3" = i Ai 



"An Dili ^« 



max(Ai, -A n ) Yh=i ^i z n ~ An J27=i ^ z i 
-A„ 



max(Ai, -A n ) - A„ 

The rest of the claim is straightforward. 

(2) Through calculations we have seen in the proof of Lemma l4~4l we have 



gi,v(z) = 2 



since I — f(z) — ^z T Hz. Now, 



^J[v T HzY - [v T Hv][z T Hz - 21} 



v T Hi 





v T Hz 


= 2 


v T Hv 



\v T Hv\ > \v T Hv\-2\(v-v) T Hv\-\{v-v) T H(v-v) 
> |A„| - 2a|A„| - a 2 max(Ai, -A n ). 



Finally. 



9i,v(z) 
l|V/» 



2\v T Hz\ 
\\Hz\\\v T Hv\ 



< 



2\\v\\\\Hz\\ 



\\Hz\\[\\ n \ -2q|A„| -a 2 max(Ai,-A n )] 
2 



|A„| - 2a|A„| - a 2 max(Ai, -A n ) 
We will use the following result. 



□ 



Proposition 4.7. (Products and norms) Let Ai and Ai, where i = l,...,k, be 
matrices such that the products A1A2 ■ ■ ■ Ak and A1A2 ■ ■ ■ A k are valid. Then 



\A 1 A 2 ---A k ~A 1 A 2 ---A k \\ < 



Y[(\\M + Ui-M) 



nil ^ 



(4.3) 



Proof. The formula follows readily from 

A X A 2 ---Ak- A X A 2 ■■■A k = [A 1 + (A 1 - Ax)] ■ ■ ■ [A k + (A k - A k )} - A X A 2 ■ ■ ■ A k . 

□ 
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Lemma 4.8. (Uniform bounds on differences) Suppose f : R" — » K satisfies As- 
sumption \4-S\ Assume that v is a unit vector such that \\v — v\\ < a. For every 
e > 0, there exists d > 0, 7 > and convex neighborhoods Us and U' s of such that 
P(f, S, 7, Us, U' s ) holds, and for all x £ Us, we have 

(1) |i*(V/(*(aO))-^V/(z(aO))l < e - 

( 2 ) m(V/(5(o;))) t - U ~ u(Vf(z(x))y- t 'v "' ' 

( 3 ) J^f} v ^ - n^W_ <£ 



!|V/(z(*))|| 



|[V/(*(»)II 



where f(z(xj) — I and I £ (—7,0]. 



Proof. We use Lemma 14.41 which says that for ei > 0, there are 5 > 0, 7 > 
and convex neighborhoods [/a and U' s of such that P(f,6, r y,Ug,U' s ) holds, and 
\\z(x) — z(x)\\ < €i\\z(x)\\ for all x £ Us- 

(1) We can easily obtain \\z(x)\\ < (1 + e 1 )||z(»)||. Let iJ = V 2 /(0). Now, 

\\Vf(z(xj) - Vf(z(x))\\ < \\Vf(z(x)) - V/>(x))|| + \\Vf(z(x)) - V/(z(ar))|| 



< ||#(z(x)-z(x))|| + 



Hz{x)- [ V' 
Jo 



f(tz(x))dt ■ z(x) 



< e^HWWzWW + WztfW 



H 



V 2 f(tz{x))dt 



< ei||«(x)||||fl-||+*(l + ci)||«|| 

= \\z(x)\\[e 1 \\H\\+6(l + e 1 )}. 

Note that the term [ei||i?|| + 5(1 + e\)] can be made arbitrarily small. Note that 
^^P>min(|A„_ 1 |,|A„|),so 



|| V/(*(x)) - V/(ar(x))J| [ei||^||+*(l + ei)] 



||V/(z(*))|| 
Next, for any wi, W2 £ R™\{0}, we have 



min(|A„_i|, |A„|) 



(4.4) 



Wi 



U'2 



w 2 \ 



< 



< 



< 



W-2 



\Wl\ 



\Wl\ 



w 2 



w 2 



\wi\ 



w 2 \ 



\\wi ~ W2W \\w2W |||w 2 || - IJ^ij 

IKII 

Awx - w 2 \\ 



\WI\\\\W 2 \ 



(4.5) 



Apply the observation in 
(2) Let M = 



5]) to (|4.4p to get what we need. 

1 



-A„ 



; i . , We have from Lemma I4.6f 1) that 

max(Ai,— X n ) — X n ' * ' 

| M (v/(/(a:))) T '»| — From (l)i f° r e i > 0j there exist S > 0, 7 > and convex neigh- 
borhoods Us and U' s of such that P(f,6,'y,Us,Ug) holds, and \u(\7 f(z(x))) T v — 
u(Vf{z{x))) T v\ < a for all x £ Ug. First, 



\u(Vf(z(x))fv\ 



1 

\u(\7f(z(xWv\ 



> 
> 

< 



\u(\7f(z(x))) T v\ - \u(\7f(z(x))) T v - u(\7f(z(x))) T v\ 



M 
l-e x M' 
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Next, 



u(Vf(z(x))) T v u(Vf(z(x))) T v 



\v T [u(Vf(z(x)))~u(Vf(z(x)))}\ 
\u(Vf(z(xWv\\u(Vf(z(xWv\ 

'■ M—^-A\u{Vf{z{x))) - u(Vf(z(x)))\\ 



< 



1 - eiM' 
M 2 



1 - eiM 

As the RHS converges to zero as e\ \ 0, we are done. 
(3) We use Proposition 14.71 to get 

9l,v(z) 9i A z ) 



l|V/(z(aO)|| \\Vf{z{x))\\ 

< [\3i,v{z) -giA z ) I + \Si,v(z)\ 
\mAz)\ 



i 



l|V/(z(x))|| 



l|V/(*(s))|| 

\gi,v(z)_- 9i,v(z)\ 

l|V/(*(x))|| 



\gi_A z )\ 

\\Vf(z(x))\\ 



1 - 



\\vm*))\ 

l|V/(*(s))|| 



\\Vf(z(x))\\ 



l|V/(*(s))|| 

\m_Az)\ 

\\Vf(z(x))\\ 



By Lemma T4. 41 for any t\ > 0, there exist S > 0, 7 > and convex neighborhoods 
Us and U' s of such that P(/, 6, 7, /7,s, t/j) holds, and \gi tV (z) — gi, v (z)\ < ei||z(a;)|| 
for all x G Ug. So 

\gi,v{z) - gi,v{z)\ < £ill^)ll 



l|V/(z(x))|| 



< 



min(A„_i, |A„|)||z(a;) 



min(A„_i, |A„|)' 

Next, we may reduce 8 if necessary so that ||V/(z(a;)) — V/(z(x))|| < t\ ||z(a;)||for 



all x € Us as well. We have 
\Wf(z(x))\ 



1 - 



l|V/(z(*))|| 



< 



< 



< 



|||V/(z(x))||-HV/(z(x))||| 

||V/(z(*))|| 
||V/(z(x))-V/X^))|| 

\m(z{x))\\ 

\\Vf(z(x))-Vf(z(x))\\ 
\\Vf(z(x))\\-\\\7f(z(x))-Vf(z(x))\\ 

ei\\z(x)\\ 

min(|A„|, |A n _i|)||z(x)|| - ei||z(x)|| 

_ ei 

min(|A„|, |A„_i|) - ei ' 

The RHS of both previous formulas converge to zero as e q \ 0, and ,, jfh-^lL is 

1 N ' ||V/(z(x))|| 

uniformly bounded by Lemma I4~6f 2). We have (3) as needed. □ 

We have the following theorem. 

Theorem 4.9. (Hessian behavior and convexity) Let f : M" — > K be such that f is 
C 2 in a neighborhood of a nondegenerate critical point x with Morse index one, and 
let v be the eigenvector corresponding to the negative eigenvalue of ~S/ 2 f(x). Let v 
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be a unit vector such that \\v — v\\ — a is small. Then for any e > 0, there are 
S > 0, 7 > and convex neighborhoods Us and U' s satisfying P{f, S, 7, Us, U' s ) such 
that 

• For all I G (f(x) — 7, f(x)], gf v : Us — > R is convex on Us 

• II * £ff„(a;) — v9iv( x )\\ — e f or a M x satisfying gi, v (x) > 0. Here 
V 2 g 2 j(x) equals j^rjf^s [Hvv T H - (v T Hv)H\, where H = V 2 /(x) by 
Proposition 12.11 

Proof. The formulas for the Hessian V 2 (x) in Proposition 12.41 and Proposition 
12.11 are equal. We want to show that for all e > 0, there exists S > and a 
neighborhood Us of x such that \\V 2 g? v (x) — V 2 gf v (x)\\ < e for all x € Us- Without 
loss of generality, suppose Assumption 14.21 holds. The formulas for V 2 gf v (x) and 
V 2 g 2 v {x) in Proposition ^. 41 can be written as 



V 2 ( 5 L)(x) - 2 



u(V/(aO) u(V/(/)) \ / u(V/(z)) «(V/(z')) 



-2 



u(yf{z)) T v u{Vf(z')) T v J V u(V/(z)) T w u(V/0')) T f 
^ r «(V/(z))« T \ V 2 /(z) / u(Wf(z))v T \ T 



\S7f(z)\\ V v T u(Vf(z))J v T u(Vf(zj) V v T u(Vf(z)) 
g l>v {x) ( T u(Vf(z'))v T \ V 2 f(z>) (_ u(Vf(z'))v T 



\\Vf(z>)\\ V vT u (Vf(z>))J v T u{Vf{z>)) V v T u(Vf(z>)) 
where u(x) — x/||x||. These formulas can be rewritten as finite sums of products 



and other terms 



of terms of the form v t u(y ) {z[x))) , u(Vf{z)), V 2 /(z), 
involving z' . We can establish the positive defmiteness of the V 2 g 2 v (x) for x close 
to x by ensuring that || V 2 gj v (x) — V 2 gf v (x)\\ goes to zero. This is immediate from 
Proposition 14.71 and Lemmas 14.61 and 14.81 

Applying Proposition 14.11 gives us the result in hand. □ 



Remark 4.10. (The case I > f(x)) A result similar to Theorem 14. 91 establishing the 
convexity of gf v (-) for I > f(x) like in Proposition 12.11 would be attractive. But 
for I > f(x), the vectors z(x) and z(x) may not exist at all. Yet another issue to 
consider is that gi tV (x) may be positive but gi iV (x) is zero, or vice versa, making 
comparison with V 2 g 2 v and V 2 gf v more difficult. Even if these were not an issue, 
w T V f(z(x)) could be zero or be close to zero for some x, resulting in a division by 
zero in the formulas in Proposition 12.41 Nevertheless, the Hessian V 2 g 2 11 (-) is still 
positive definite if v T Vf(z(x)) is sufficiently far from 0. If it turns out that the 
Hessian is positive definite whenever gi, v {x) > 0, then one can still use Proposition 
14. II to establish convexity. 

5. Observations from a numerical experiment 

In this section, we implement a simple version of the mountain pass algorithm 
on a two dimensional problem (called the six hump camel back function in [MM04 ) 
defined by 

f{xi,x 2 ) = (4 - 2.1x1 + x{/S)xl + x lX2 + 4(x 2 - l)x\. (5.1) 

In our numerical experiments, we only seek to obtain graphical information from 
this two dimensional example that the parallel distance is a good strategy. We 
calculate the Hessians V 2 f(x) at each evaluation. While practical implementations 
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FIGURE 5.1. The first two iterates of Algorithm ^ . 31 are operations 
(PD) to reduce the parallel distance, marked as (pi) and (p2), and 
the third iterate, marked as (a3), is an (Av) operation to adjust the 
vector v. In this case, the algorithm concentrates its efforts close 
to the saddle point near (—1, 0.8). If a different endpoint had been 
fixed in (a3) instead, the algorithm might have found the saddle 
point (0, 0). The saddle point (0, 0) has a higher value, and would 
be of greater interest as the bottleneck. 

will not calculate the Hessian, we can study the potential of methods that create 
second order models from previous gradient evaluations. 

We look at Figure 15.11 One observation that can be made for Algorithm 13.31 is 
that while Algorithm 13.31 focuses its computations on a saddle point in two runs of 
(PD) and one run of (Av), it did not focus its computations on the saddle point 
(0,0), which has a higher critical value. We can see this phenomenon as part of 
the risks involved in trying to zoom computations to a saddle point. Moreover, 
this is unavoidable because in a general problem, an optimal mountain pass may 
be difficult to find by any method. Furthermore, for this example, when the moun- 
tain pass algorithm is run between the saddle point near (—1,0.8) and the local 
minimizer near (0.1,-0.7), it may find the saddle point (0,0). 

6. Conclusion 

We propose two Principles (PI) and (P2) that a good mountain pass algorithm 
should satisfy. We proposed the subroutine (PD) in Algorithm 13.11 to build our 
global mountain pass algorithm in Algorithm l3.3l making use of the parallel distance 
<?;,„(•)■ Through Proposition 12. 11 we see that gi v (-) 2 satisfies (Pl')> and that (P2) 
follows from work in [LPllj . Sections [2] and @] discuss how gi v (-) 2 satisfies property 

(PI); 

Finally, we envision that a robust mountain pass algorithm should include qua- 
dratic model methods, level set methods and path-based methods. For example, the 
points chosen for function and gradient evaluations in a level set method should 
be such that they provide insight for quadratic model methods and path-based 
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methods. The right blend of these methods allow them to overcome each other's 
shortcomings. The evidence from our numerical experiments so far are encouraging. 

Acknowledgement. We thank Jiahao Chen for discussions that led to the main ideas 
in this paper and for references to the literature on computing saddle points, and 
to James Renegar for some helpful discussions. 
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